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ABSTRACT: A common situation in experimental physics is to have a signal which can not be 
separated from a non-interfering background through the use of any selection criteria. In this paper, 
we describe a procedure for determining, on an event-by-event basis, a quality factor (<2-factor) that 
a given signal candidate originated from the signal sample. This procedure generalizes the "side- 
band" subtraction method to higher dimensions without requiring the data to be divided into bins. 
The <2-factors can then be used as event weights in subsequent analysis procedures, allowing one 
to more directly access the true spectrum of the signal. 
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1. Introduction 

A common situation in many experiments is the presence of a background which can not be cleanly 
separated from the desired signal. If one has a priori knowledge of certain features of the signal 
and background, algorithms can be created to separate the two types of data. Procedures have 
been developed to handle many of these situations (see, e.g., OE Q, 0D- Two of the more common 
methods for performing this type of data classification are neural networks and decision trees. In 
both procedures, the information known about the signal and background is used to learn how to 
optimize signal-background separation. 

Consider now the case where the distributions of the signal and background are not known, 
and in fact, it is these distributions which we want to measure. Of particular interest is the presence 
of an irreducible background, i.e. one which can not be reduced using any selection criteria. An 
example of an irreducible background to the yp — > pco, ft) — ► 7r + 7r~7T reaction (discussed below) 
would be non-ft) yp — ► p7t + 7i~7i° events. For any individual signal candidate, it is impossible to 
distinguish between these two types of data; thus, this type of background can not be reduced 
through the use of any selection criteria. As an example of a reducible background to this reaction, 
consider events where the wrong beam photon has been associated with the tracks in the detector. 
This type of background may be reducible by examining the timing information of the beam and 
the tracks. 

Perhaps the simplest method for handling an irreducible background is "side -band" subtrac- 
tion. In this procedure, distributions constructed using data from outside the signal region are 
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subtracted from those built using data from inside the signal region in order to remove the back- 
grounds. While this method can be effective in some situations, implementing it can become 
problematic if the kinematics of the background region are different than those of the signal region 
or if the problem is sufficiently multi-dimensional such that binning the data is severely limited by 
statistical uncertainties. 

In this paper, we describe a procedure for generalizing the one-dimensional side-band sub- 
traction method to higher dimensions without having to bin the data. Our method involves using 
nearest neighbor events to assign each signal candidate (henceforth referred to as an "event") in a 
data sample a quality factor (Q-factor) which gives the probability that it originates from the signal 
sample. The data are assumed to be described by a set of coordinates which can be masses, an- 
gles, energies, etc. The distributions of the signal and background must be known (possibly with 
unknown parameters) in a subset of the coordinates, referred to as the reference coordinates. No 
a priori information concerning the signal or background distributions in any other coordinate is 
required. Thus, parametrizations of the signal and background are not necessary in any of the non- 
reference coordinates. The unknown parameters in the signal and background reference-coordinate 
probability density functions (PDFs) are determined locally in the non-reference coordinates, i.e. 
they are allowed to vary according to the non-reference coordinates. No information, not even 
parametrizations, about how these parameters depend on the non-reference coordinates is required 
(see Section @ for more details). This is an extremely useful property of this method and one that 
makes its use in certain analyses advantageous over many other methods. Once the <2-factors are 
obtained, the data in the side bands can be discarded (this may be desirable in some analyses). 

The <2-factors can be used as event weights in subsequent analysis procedures to gain access 
to the signal distribution. For example, the Q-factors can be used in an event-based unbinned 
maximum likelihood fit performed on the data to extract physical observables. By computing the 
weighted sum of log likelihoods, the background subtraction is carried out automatically in the fit 
without ever having to resort to dividing the data up into bins. Eliminating the need to bin the data 
is highly desirable for the case of multi-dimensional problems. 

In this paper we assume that no correlation exists between the reference coordinates and the 
remaining set of coordinates. In principle, this limitation could be overcome in some analyses (see 
Section @). It is also assumed that there is no quantum mechanical interference between the signal 
and background. One final assumption is that the signal and background distributions do not vary 
rapidly in the non-reference coordinates relative to the correlation distances, i.e. the diameters of 
the hyper-spheres used to collect the nearest neighbor events. A similar constraint exists in binned 
analyses. If a distribution varies rapidly relative to the bin width, then some of the finer structure 
present in the distribution will be lost. This same "averaging" effect can also occur in our method 
if the signal distribution possesses rapid variations in the non-reference coordinates. 

The idea of using nearest neighbor events as a means of data classification is not new (see, 
e.g., ED)- Other methods also exist which exploit the concept of reference coordinates to separate 
out contributions from different sources to a data set. A good example is the s&lot technique 05fl. 
The method presented in this paper is unique in that it combines these ideas to carry out an unbinned 
side-band subtraction which results in each event obtaining an event weight. The only information 
required as input are parametrizations of the signal and background in terms of the reference co- 
ordinates. No a priori knowledge about the signal or background in the remaining coordinates is 
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necessary. Another advantage of this method is that it permits the presence of unknown param- 
eters in the signal and background reference-coordinate PDFs. These parameters are determined 
locally in the non-reference coordinates; thus, they are allowed to vary in these coordinates and 
no parametrization of these variations is required. The event weights obtained can be used in an 
unbinned fit to extract physical observables from the data. 

As an example, in Section || we will consider the reaction yp — > poo, ft) — > n + n~ n°. There are, 
of course, production mechanisms other than yp — ► pco which produce the same final state and no 
selection criteria exists which can separate out the yp — > poo events (the background is irreducible). 
The only knowledge we have about the background is that it can be parametrized by a polynomial 
(whose parameters are unknown) in the three-pion mass. The distribution of the background in all 
other variables is completely unknown. The goal of our model analysis will be to measure the ft) 
polarization observables known as spin density matrix elements, which can be extracted from the 
distribution of the pions in the ft) rest frame. Ideally, we would like to avoid binning the data and 
to extract these observables from an event-based maximum likelihood fit. The method presented in 
this paper will allow us to perform such an analysis. 

2. Quality Factor Determination 

Consider a data set composed of n total events, each of which is described by m coordinates £ 
(m > 2). Furthermore, the data set consists of n s events which are signal and r# events which are 
background. Both the signal and background distributions are functions of the coordinates, S(£) 
and B(%), respectively. For this procedure, we need to know the functional dependence (unknown 
parameters are permitted) of the signal and background distributions in terms of (at least) one of 
the coordinates. We will refer to this coordinate as the reference coordinate and label it E, r . It is 
trivial to extend this method to consider any number of reference coordinates if necessary. 

As an example, consider the case where the reference coordinate is a mass. The functional 
dependence of the signal, in terms of might be given by a Gaussian or Breit-Wigner distribution. 
The background may be well represented by a polynomial. In both cases, there could be unknown 
parameters (e.g. the width of the Gaussian); these are permitted when using this procedure. These 
unknown parameters can vary in terms of the non-reference coordinates. For example, the width 
of a Gaussian describing the mass of a composite particle may depend on the lab angles of its 
decay products. The unknown parameters in the PDFs are determined locally in the non-reference 
coordinates; thus, these types of variations are handled automatically by the method. No other 
a priori information is required concerning the dependence of S(%) or fi(^) on any of the other 
coordinates. 

The aim of this procedure is to assign each event a quality factor, or Q-factor, which gives 
the probability that it originates from the signal sample. We first need to define a metric for the 
space spanned by t, (excluding <§ r ). A reasonable choice is to use 8u / where a* is the root mean 
square (RMS) of the k th variable in the appropriate phase-space distribution (see Section |5] for some 
examples). This gives equal weight to each variable. Some care should be taken when choosing a 
metric and the choice presented here may not be the optimal one for all analyses. Discussion on 
this topic can be found at the end of this section. Using this metric, the distance between any two 
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events, dy, is given as 



4=1 

k^r 



(2.1) 



where the sum is over all coordinates except t, r . This is known as the normalized Euclidean dis- 
tance. 

For each event, we compute the distance to all other events in the data set, and retain the n c 
nearest neighbor events, including the event itself, according to Eq. (\Z. 1| ). The value of n c , which 
varies depending on the analysis, is discussed below and in Section [5]3]. It is worth noting that the 
limit n c — ► n is equivalent to performing a global side-band subtraction {i.e. determining the total 
number of signal events in the data set). The n c events are then fit using the unbinned maximum 
likelihood method to obtain estimators for the parameters, a, in the PDF 

^,«) = TT ##^#^-, (2.2) 

where F s and Fj, describe the functional dependence on the reference coordinate, % r , of the signal 
and background, respectively. These distributions are normalized such that for any given set of 
estimators, a, 

J F s (^ r , a)d£, r = n sig , J F b (t; r , &)d£ r = n bkgd , (2.3) 

where n s i g (nbicgd) is the total amount of signal (background) extracted from the n c nearest neighbor 
event sample. We now return to the assumption in Section [I] that the reference and non-reference 
coordinates are uncorrected. The motivation for this assumption should now be clear. If there 
is a strong correlation between these two types of coordinates, it could lead to distortions in the 
reference-coordinate distributions constructed out of the n c nearest neighbor events resulting in a 
bias in the extracted Q-factors. In principle, this could be overcome if these distortions can properly 
be accounted for in the PDFs. While this possibility is intriguing, its validity is not tested in this 
paper as it is likely to be very problem specific. 

The <2-factor for each event is then calculated as 

Q= F >&M (2 4) 

where t, l r is the value of the event's reference coordinate and &i are the estimators for the parameters 
obtained from the event's fit. Since a separate fit is run to determine the a, values for each event 
using its n c nearest neighbors, the estimators obtained are the local values for the hyper-sphere 
around the i th event. Thus, if they vary according to the non-reference coordinates, these variations 
will be automatically accounted for provided they do not vary rapidly relative to the correlation 
distance (one of our stated assumptions in Section [I]). We note here that a similar looking construct 
involving likelihood ratios (also denoted by Q) is used in high-energy physics to determine discov- 
ery significance (see, e.g., [0]). The ratio in Eq. ([2.40 is built from terms using estimators obtained 
from the same fit; thus, it is quite different from its likelihood ratio namesake. 
If one wants to bin the data, the signal yield in a bin is obtained as 

"bin 

& = £,Qi, (2-5) 
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where nu n is the number of events in the bin. For example, to construct a histogram (of any 
dimension) of the signal, one would simply weight each event's contribution by its Q-factor. 

The metric presented in this section is sufficient for many high-energy physics analyses; how- 
ever, there are some cases where it may not be the optimal choice. One of our stated assumptions 
in Section [j] was that the distributions should not vary rapidly relative to the correlation distance 
(which is determined by the metric and statistics). This could result in the loss of some of the finer 
structure in the signal. If the distributions are expected or observed to vary more rapidly in some 
subset of the coordinates, then it may be necessary to use a metric which can handle this situation. 
Thus, for some analyses it may be necessary to weight the coordinates unequally in the metric or 
to construct a totally different metric all together. We can again compare this situation to a similar 
one in binned analyses. Consider a two-dimensional analysis where the physics is known to vary 
rapidly in coordinate x and slowly in coordinate y. In this case, one would simply bin the data finer 
in x than in y prior to analyzing it. This asymmetric binning is analogous to using unequal weights 
in the metric in our procedure. 

From this discussion one can clearly see that the choice of metric can effect the results, i.e. the 
metric effects how the n c nearest neighbor events are obtained. This is analogous to the fact that 
in a binned analysis choosing a different binning scheme can effect the results. There is no way to 
quantitatively determine what is the optimal binning. This is determined in an ad-hoc manner in 
each analysis. Typically, one wants the bin population to be high to reduce the relative statistical 
uncertainty. This dictates choosing a large bin width. If, however, the bin width is large relative to 
the variations in the distributions, then the finer structure will be lost. Thus, there are two competing 
factors which should be considered when choosing a binning. In practice, any reasonable choice 
should not effect the observables extracted from the data. 

This same ad-hoc approach is necessary in our method when choosing a metric and a value for 
n c . The statistical uncertainties on the <2-factors increase as n c decreases. Clearly, we would like 
to keep this uncertainty as small as possible. This would dictate choosing a large value for n c . If, 
however, n c is large relative to n, then the method is averaging over large fractions of phase space. 
This can result in losses of the finer structure in the distributions. Thus, the factors to consider 
when choosing a value for n c are the same as when choosing a bin width in a binned analysis. 

It cannot be stressed enough that the validity of the method presented in this paper depends 
on how rapidly the coordinates vary relative to the correlation distance. The correlation distance 
is determined by the signal and background PDFs, the metric and the value of n c . Unfortunately, 
it is not possible to provide a general prescription for either the metric or the value of n c that will 
guarantee the validity of the method for every analysis. Instead, the choice of metric and n c must 
be studied in the data and also in Monte Carlo data (see Section As in a binned analysis, 
any reasonable choice should not effect the extracted observables. We conclude this section by 
noting that for cases with very high dimensionality, it may be necessary to work in "Gaussianized" 
variables [0] instead of the measured quantities. 

3. Error Estimation 

It is also important to extract the uncertainties on the individual Q-factors so that we can obtain 
error estimates on measurable quantities. The full covariance matrix obtained from each event's fit, 
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C a , can be used to calculate the uncertainty in Q as 

When using these values to obtain errors on the signal yield in any bin, we must consider the fact 
that the nature of our procedure leads to highly-correlated results for each event and its n c nearest 
neighbors. The uncertainty on the signal yield due to signal-background separation in a bin is 
properly given by 

'J 

where the sums are over the events in the bin and p, ; - is the correlation factor between events i 
and j. This factor is equal to the fraction of shared nearest neighbor events used in calculating the 
Q-f actors for these events. 

Keeping track of these correlations can be a bit cumbersome (though, it is possible). An 
overestimate of the true uncertainty inherent in the procedure can be obtained by assuming 100% 
correlation as follows: 

nbm 

G>V = Ys Ci QU (3-3) 



How much this approximation overestimates the errors depends on the population of the bin; how- 
ever, it is often a decent estimate due to the similar constraints which factor into the choices of bin 
size and the value of n c (see Section |5331) . 

In addition to the uncertainties associated with the fits, there will also be a purely statistical 
uncertainty associated with the signal yield in each bin, given by Poisson statistics as follows: 

n bir. 



= (3-4) 



The total uncertainty on the signal yield in any bin is then obtained by adding the fit errors, calcu- 
lated using Eq. (|3.3D, in quadrature with the statistical errors obtained from Eq. (|3.4|). 

The variation of the correlation distance as a function of the non-reference coordinates makes 
proper handling of the errors an intricate task. The method for determining the uncertainty on the 
signal yield in any bin presented in this section is rigorous; however, the impact of the correlations 
between the <2-factors and their uncertainties on physical observables extracted from the data is 
not as straight-forward. For example, if the data is binned, then correlations will also be present 
between the bins (although, they should be small). It is strongly recommended that a Monte Carlo 
study be performed to ensure that the uncertainties on physical observables extracted from the data 
are being handled correctly. 



4. Extracting Observables: Event-Based Fitting using Q-Factors 

One of the primary motivations behind the development of this method was to make it possible 
to extract physical observables from multi-dimensional distributions without having to resort to 
binning the data. The Q-f actors obtained for each event above can be used in conjunction with the 
unbinned maximum likelihood method to avoid this difficulty. 
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If we could cleanly separate out the signal from the background, then the likelihood function 
would be defined as 

J? = f[W(i), (4.1) 

i 

where W is some PDF (with unknown parameters) which the data is to be fit to. We could then 
obtain estimators for the unknown parameters in W by minimizing 

-lnJSf = -|>W(|i). (4-2) 

i 

For cases where it is not possible to separate the signal and background samples, we can use the 
<2-factors to achieve the same effect by rewriting Eq. Q4.2| ) as 

-lnJSf=-£&lnW(fO, (4.3) 

i 

where the sum is now over all events (which contains both signal and background). Thus, the 
<2-factors are used to weight each event's contribution to the likelihood. 

5. Example Application 

As an example, we will consider the reaction yp — ► poo in a single (s,t) bin, i.e. a single center-of- 
mass energy and production angle bin (extending the example to avoid binning in production angle, 
or t, is discussed below). The ft) decays to 7t + 7i~7t° about 90% of the time; thus, we will assume we 
have a detector which has reconstructed yp — ► pK + n~n° events. Of course, there are production 
mechanisms other than yp — ► pco which can produce this final state and there is no selection criteria 
which can separate out events that originated from yp — ► poo. Below we will construct a toy-model 
of this situation by generating Monte Carlo events for both signal, i.e. ft) events, and background, 
i.e. non-ft) 7i + 7i~7t° events (10,000 events were generated for each). The goal of our model analysis 
is to extract the ft) polarization observables known as the spin density matrix elements, denoted by 
Pmm' (discussed below). We note here that in this example we will assume that the signal and 
background do not interfere (and generate the Monte Carlo data accordingly). In real data, for this 
reaction a small amount of interference would be expected due to the 8.44 MeV natural width of 
the ft). 

In terms of the mass of the k + k~k q system, mj, K , the ft) events were generated according to 
3-body phase space weighted by a Voigtian (a convolution of a Breit-Wigner and a Gaussian, see 
Eq. (|5.7D) to account for both the natural width of the ft) and detector resolution. For this example, 
we chose to use a = 5 MeV/c 2 for the detector resolution (see Fig. |J). The goal of our analysis 
is to extract the three measurable elements of the spin density matrix (for the case where neither 
the beam nor target are polarized) traditionally chosen to be p^ , p^_ l and Rep® . These can be 
accessed by examining the distribution of the decay products (7l + n~K°) of the ft) in its rest frame. 

For this example, we chose to work in the helicity system which defines the z axis as the 
direction of the ft) in the overall center-of-mass frame, the y axis as the normal to the production 
plane and the x axis is simply given by x = y x z. The decay angles 8 , are the polar and azimuthal 
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signal + bkgd signal bkgd 




Figure 1. (Color Online) <p (radians) vs cos 9: Generated decay angular distributions for all events (left), 
only signal events (middle) and only background events (right). 




Ma, (GeV/c 2 ) 

Figure 2. Mass of the n + n~n° system (GeV/c 2 ) for all generated events (unshaded) and for only the 
background (shaded). 

angles of the normal to the decay plane in the (O rest frame, i.e. the angles of the vector (p n + x p n -). 
The decay angular distribution of the CO in its rest frame is then given by [@] 

W(e,0) = ^Q(l-Po O o) + ^(3Po°o-l)cos 2 

— Pi— i sin" cos sin20cos<^ , (5.1) 

which follows directly from the fact that the ft) is a vector particle; it has spin-parity J p = 1 . We 
chose to use the following p^ M , values for this example: 

p£, = 0.65 (5.2a) 

p{>_ i=0.05 (5.2b) 
Rep% = 0.10 (5.2c) 
The resulting generated decay distribution is shown in Fig. |IJ. 
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For the background, we chose to generate it according to 3-body phase space weighted by a 
linear function in m 3K and 

W(0,&) = — (l + |sin0cos0|) (5.3) 

07T 

in the decay angles. When separating out the signal below, we assume we have no knowledge of 
Eq. |53] (this would typically be the case). Thus, the separation of signal and background will be 
carried out using only the knowledge that the background can be parametrized as a polynomial 
(with unknown parameters) in m 37[ . No information about the distributions of the background in 
any other variables will be used. Figure || shows the n + n~n° mass spectrum for all generated 
events and for just the background. The generated decay angular distributions for all events, along 
with only the signal and background are shown in Fig. [I]. There is clearly no selection criteria 
which can separate out the signal. 

5.1 Applying the Procedure 

To obtain the <2-factors, we first need to identify the relevant coordinates, i.e. the kinematic vari- 
ables in which we need to separate signal from background. The n + K~7i a mass will be used as the 
reference coordinate, t, r = m 3n . The stated goal of our analysis is to extract the p^j M , elements. We 
will do this using Eq. Q5.1|); thus, only the angles d,<p are relevant. Other decay variables, such as 
the distance from the edge of the 7i + 7t~7t° Dalitz plot, are not relevant to this analysis — though, 
they would be in other analyses (see Section |575| ). 

Using the notation of Section ^, f = (w^, cos 0,0). The RMS's of the relevant kinematic 
variables are 

ai= [ <p 2 d$ = 27i 3 /3 (5.4a) 

a c 2 os0 = J cos 2 0dcos0 =2/3. (5.4b) 
The distance between any two points, du, is then given by 

,2_3 



^"2 



(cOS0 ; --COS0y) 2 + ( ^ 07)21 



(5-5) 



r 3 

The functional dependence of the signal and background on the reference coordinate, m 3ll , are 

F s (m 3jc ,a) =s-V(nt3 n ,m m ,r m ,a) (5.6a) 

F b (m 37t ,a) =b l m 37t +b Q , (5.6b) 

where m m = 782.56 MeV/c 2 , r ffl = 8.44 MeV, a = 5 MeV is the simulated detector resolution, 
a = (s,b\,bo) are unknown parameters and 

r / 1 r,., M 

(5.7) 



Vim^n.mfo.Ta^) = J_ Re 
V2na 



is the convolution of a Gaussian and non-relativistic Breit-Wigner known as a Voigtian (w(z) is the 
complex error function). 

As stated above, the number of nearest neighbor events required depends on the analysis. 
Specifically, it depends on how many unknown parameters there are, along with the functional 
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(GeV/c 2 ) 

Figure 3. (Color Online) Mass of the n + n~n° system (GeV/c 2 ) for all generated events (unshaded), only 
generated background events (shaded) and all generated events weighted by 1 — Q (dashed-red). 




Figure 4. (Color Online) <j) (radians) vs cos 9: Extracted decay angular distributions for all events (left), for 
events weighted by Q, signal (middle), and for events weighted by 1 — Q, background (right). 



forms of F s and Fj. For this relatively simple case, the value n c = 100 works well (see Section |5]3] 
for discussion on the value of n c ). For each simulated event, we then find the n c closest events 
(containing both signal and background events) and perform an unbinned maximum likelihood fit, 
using the CERNLIB package MINUIT [0], to determine the estimators a. The ^-factors are then 
calculated from Eq. QZ.4Q and the uncertainties are straightforward to calculate following Section 0. 

Figure |3] shows the comparison of the extracted and generated background mj, % distributions 
integrated over all decay angles. The agreement is quite good; however, we are looking for more 
than just global agreement. Figure |] shows the extracted angular distributions for the signal and 
background. The agreement with the generated distributions is excellent (see Fig. [j]). A two- 
dimensional x 2 calculation comparing the generated and fit signal histograms yields % 2 /ndf= 0.65 
(where ndf is the degrees of freedom). This comparison may not be ideal due to the relatively large 
errors which exist on the small bin occupancies; however, it is sufficient to demonstrate the quality 
of the signal-background separation. We can also compare the <2-f act ors extracted by the fits to 
the theoretical distributions from which our data was generated. Figure |5] shows that the extracted 
values are in very good agreement with the generated ones. 
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We conclude this section by discussing the importance of quality control in the fits. For this 
example, we performed 20,000 independent fits to extract the Q-factors. To avoid problems which 
can arise due to fits not converging or finding local minima, each unbinned maximum likelihood fit 
was run with three different sets of starting values for the parameters a: (1) 100% signal; (2) 100% 
background; (3) 50% signal, 50% background. In all cases, the fit with the best likelihood was 
used. The n c events were then binned and a % 2 jndf was obtained. In about 2% of the fits the 
% 2 /ndfwa.s very large, a clear indicator that the fit had not found the best estimators a. For these 
cases, a binned % 2 fit was run to obtain the <2-factor. 

5.2 Examining the Errors 

As discussed in Section 0, the covariance matrix obtained from each event's fit can be used to 
obtain the uncertainty in Q, Oq, using Eq. (|3.1|). The nature of our procedure leads to a high degree 
of correlation between neighboring events' Q-f actors. This means that adding the uncertainties in 
quadrature would definitely underestimate the true error if the data is binned. In Section 0, we 
showed how to properly handle these uncertainties and also argued that a decent approximation 
could be obtained by assuming 100% correlation which provides an overestimate of the true error. 

To examine the error bars in our toy example, we chose to project our data into a one- 
dimensional distribution in cos0. This was done to avoid bin occupancy issues which arise in 
the two-dimensional case due to limitations in statistics. Figure ^(a) shows the comparison be- 
tween the generated and calculated cos 6 distributions. The agreement is excellent. The error bars 
on the calculated points were obtained using Eq. Q3.2|). For this study, we ignore the Poisson statis- 
tical uncertainty in the yield due to the fact that the number of generated events is known. In a real 
world analysis, these should be included in the quoted error bars. 

We can examine the quality of the error estimation by examining the difference between the 
generated and calculated yields in each bin, A*3f. Figure ^(b) shows the comparison between A&~, 
Go? obtained using Eq. fl3.2| ) and c%- obtained assuming 100% correlation. As expected, the 100% 
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Figure 6. (Color Online) (a) Signal yield vs cos 9 for generated signal events (dashed) and all generated 
events weighted by g-factors (blue circles). The error bars on the extracted yields were obtained using ( |3.2| ). 
(b) Comparison of the true error on the signal yield, = | ^ c „ — &calc I (open black squares), to the error 
bars obtained using Eq. ( |3.2| ) (gray squares) and Eq. ( |j.3| ) (blue circles). 



correlated errors provide an overestimate of MV in every bin, while the errors obtained using 
Eq. flH2l ) provide an accurate calculation of the uncertainties. 

5.3 Extracting Observables 

The goal of our model analysis is to extract the spin density matrix elements. Binning the data 
would be undesirable due to limitations in statistics. Following Section |], we can instead per- 
form an event-based maximum likelihood fit employing the <2-factors to handle the presence of 
background in our data set. The log likelihood is obtained from Eq. |3] as 

-ln^ = -£ainW(0,-,0 ; ), (5.8) 

i 

where the sum is over all events (which contains signal and background). Using the <2-factors 
obtained in Section [571], minimizing Eq. ( |5.8D yields 

p,5o = 0.659 ±0.011 (5.9a) 

p,°_, =0.044 ±0.008 (5.9b) 

Rep^ = 0. 108 ± 0.007, (5.9c) 

where the uncertainties are purely statistical (obtained from the fit covariance matrix). The values 
extracted for the spin density matrix elements are in excellent agreement with the values used 
to generate the data given in Eq. ( |5.2| ). To verify the accuracy of the statistical uncertainties, an 
ensemble of 100 example datasets was generated and the analysis procedure repeated on each 
independently. The pull distributions obtained for the spin density matrix elements are shown in 
Fig. [7[ The means and widths are consistent with the expected values. Thus, no bias is found in the 
extracted observables and the statistical uncertainties obtained from the fits are correct. 
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u=-0.11+0.13 
g=1.04±0.10 



^i=-0.08±0.13 
0=1.19+0.13 




u=0.02+0.13 
a=1.15±0.09 



(a) 

Figure 7. Pull distributions for the spin density matrix elements extracted from an ensemble of datasets gen- 
erated from the signal and background PDF's given in Section |5[ The means and widths of the distributions 
are consistent with the expected values (0 and 1, respectively). 



5.4 Choosing a Metric and a Value for n c 

In this section, we will examine the choices of the metric and of the value of n c (the number of 
nearest neighbor events used to determine the Q-factors). As stated above, it is not possible to 
provide general prescriptions for the metric and the value of n c that will guarantee the validity of 
the method for every analysis. Instead, these choices must be studied using the data and Monte 
Carlo data. Figure |8](a) shows the average relative uncertainty in Q for different choices of n c . As 
expected, this uncertainty increases as n c decreases. Keeping this uncertainty as small as possible 
dictates choosing a large value for n c . If, however, n c is large relative to n, then the method will 
average over large fractions of phase space. This can result in losses of the finer structure in the 
distributions. Thus, there are two competing factors which should be considered when choosing 
the value of n c . The ratio of n c /n must be small enough to permit a true extraction of the finer 
structure; however, the value of n c must be large enough such that the relative uncertainties in Q 
are not too large. These constraints are analogous to those used to choose a bin width in a binned 
analysis. 

Figure |8](a) shows that statistical fluctuations in the ^-factors become large (relative to vari- 
ations in the physics) for n c < 100. Figure |8](b) shows the average radius (in and cos0) of the 
hyper-spheres used to collect the n c events for different choices of n c . From Fig. |] we can estimate 
the size of the regions of phase space which are small enough such that the finer structure of the 
distributions is not lost. We then conclude that the choice n c < 200 results in sufficiently small 
hyper-spheres. Following the ad-hoc prescription discussed above, we combine these two con- 
straints which results in 100 < n c < 200. Figure [5](a) shows a comparison of the Q- factors obtained 
using n c = 100 and n c = 200. There is no apparent structure and the fluctuations are consistent 
with the uncertainties on the Q-factors. Thus, the signal distributions reconstructed from these two 
choices of n c are statistically consistent. The spin density matrix elements extracted using the Q- 
factors obtained with n c = 200 are listed in Table [I]. The values are very close to those obtained 
using n c = 100 in Section |53] and well within the statistical uncertainties. 

We now want to examine the effects of choosing a different metric. In the previous sections, the 
RMS's of the variables were used in Eq. ( [2. 1| ) to determine the distance between events. Another 
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Figure 8. (a) Average relative uncertainty in Q vs n c . (b) Average hyper- sphere radius vs n c 



possible choice would be to simply use the range of the variables, i.e. replace a cos e by 2 and 
by 2%. Figure 0(b) shows a comparison of the <2-factors obtained using RMS's in the metric and 
those obtained using ranges. We again conclude that the signal distributions reconstructed from 
these two choices are statistically consistent. The spin density matrix elements extracted using the 
<2-factors obtained with this alternative metric are listed in Table |I[ The results are again very close 
to those obtained in Section 1531 

For this example, there is no physics-based motivation for adding dependence on the coordi- 
nates to the metric; however, to further demonstrate the robustness of the method, we can replace 
the RMS's in Eq. flZTll ) by the folloing quantities: 




(5.10) 



The spin density matrix elements extracted using the Q-factors obtained with this alternative vari- 
able metric are listed in Table [I]. The results are again very close to those obtained in Section |53J 
Therefore, the extracted observables are stable provided the choices of metric and n c obey the 
ad-hoc constraints discussed above. 





generated 


signal 


n c = 100 ["RMS" : "range" : "var"] 


n c = 200["RMS"] 


pU 

Rep% 


0.65 
0.05 
0.10 


0.649 
0.043 
0.103 


[0.659 : 0.656 : 0.657] 
[0.044 : 0.044 : 0.043] 
[0.108 : 0.108 : 0.107] 


0.656 
0.042 
0.107 



Table 1. P^m' va l ues (fit anc ' generated): The signal results were obtained by fitting only the generated 
signal events (unweighted). "RMS" refers to the metric constructed using the RMS's of the variables in 
Eq. fl2.1| ), "range" refers to simply using the ranges of the variables, and "var" refers to the variable metric 
constructed using Eq. ( I5.10D (see text for details). The statistical uncertainties on the extracted quantities are 
Coo = 0.01 1, <J\-i= 0.008 and <7 10 = 0.007 for each of the fit conditions. 
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Figure 9. (Color Online) (a) AQ vs. <j) (radians) vs. cos 9: Mean difference between the g-factors extracted 
using n c = 100 and n c = 200. (b) AQ vs. (j) (radians) vs. cos0: Mean difference between the g-factors 
extracted using RMS's and ranges (see text for details) in the metric. In both cases, the differences in the 
g-factors are consistent with their statistical uncertainties, i.e. there is no sensitivity in the reconstructed 
signal distributions to these choices of n c or metric. 

5.5 Extending the Example 

To extend this example to allow for the case where the data is not binned in production angle, we 
would simply need to include cos 6® M or t in the vector of relevant coordinates, t, . To perform a full 
partial wave analysis on the data, we would also need to include any additional kinematic variables 
which factor into the partial wave amplitudes, e.g. the distance from the edge of the 7i + 7T~7r Dalitz 
plot (typically included in the ft) decay amplitude). We would then construct the likelihood from 
the partial waves and minimize — ln«£f using the Q-f actors obtained by applying our procedure 
including the additional coordinates. An example of this can be found in flTOl . 

6. Conclusions 

In this paper, we have presented a procedure for separating signals from non-interfering back- 
grounds by determining, on an event-by-event basis, a quality factor (Q-factor) that a given event 
originated from the signal distribution. This procedure is a generalization of the side-band subtrac- 
tion method to higher dimensions which does not require any binning of the data. We have shown 
that the Q-factors can be used as event weights in subsequent analysis procedures to allow more di- 
rect access to the true signal distribution. For example, the Q-factors can be used to weight the log 
likelihood in event-based unbinned maximum likelihood fits. This leads to background subtraction 
which is carried out automatically during the fits. 

The method presented in this paper is meant to be applied to analyses where the distributions 
of the signal and background are unknown. All that is required is that each can be parametrized 
in terms of (at least) one coordinate which must not be correlated with the remaining set of coor- 
dinates. While it may be possible to overcome this limitation (i.e. it may be possible to account 
for the correlations in the PDFs), this topic is not explored in this paper as it is likely to be very 
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problem specific. No knowledge about the distributions of the signal or background in any other 
coordinates is necessary. 

The validity of the method depends on the choices of two ad-hoc quantities: the metric and the 
number of nearest-neighbor events used to define each hyper-sphere. It is not possible to provide 
general prescriptions for these quantities that will guarantee the validity of the method for every 
analysis. Instead, they must be studied using the data and Monte Carlo data for each analysis 
individually to confirm that the method produces valid results. The correlation distance varies 
according to the non-reference coordinates which makes proper handling of the errors an intricate 
task. Thus, the accuracy of the uncertainties on physical observables extracted from the data should 
be verified by a Monte Carlo study as well. Given these caveats, it is clear that performing a detailed 
study using Monte Carlo data is an important prerequisite to using this method for any analysis. 
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